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BACKGROUND OF THE INVENTION 
Field of the Invention 

[0009] The present invention relates generally to microprocessors, and more 

specifically to a RISC microprocessor having plural, symmetrical sets of registers. 

Background 

[0010] In addition to the usual complement of main memory storage and 

secondary permanent storage, a microprocessor-based computer system typically 
also includes one or more general purpose data registers, one or more address 
registers, and one or more status flags. Previous systems have included integer 
registers for holding integer data and floating point registers for holding floating 
point data. Typically, the status flags are used for indicating certain conditions 
resulting from the most recently executed operation. There generally are status 
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flags for indicating whether, in the previous operation: a carry occurred, a 
negative number resulted, and/or a zero resulted. 
[0011] These flags prove useful in determining the outcome of conditional 

branching within the flow of program control. For example, if it is desired to 
compare a first number to a second number and upon the conditions that the two 
are equal, to branch to a given subroutine, the microprocessor may compare the 
two numbers by subtracting one from the other, and setting or clearing the 

0 appropriate condition flags. The numerical value of the result of the subtraction 
J= need not be stored. A conditional branch instruction may then be executed, 
J3 conditioned upon the status of the zero flag. While being simple to implement, 
S this scheme lacks flexibility and power. Once the comparison has been performed, 

1 * no further numerical or other operations may be performed before the conditional 

branch upon the appropriate flag; otherwise, the intervening instructions will 
kJ overwrite the condition flag values resulting from the comparison, likely causing 

K erroneous branching. The scheme is further complicated by the fact that it may 

^ be desirable to form greatly complex tests for branching, rather than the simple 

equality example given above. 
[0012] For example, assume that the program should branch to the subroutine 

only upon the condition that a first number is greater than a second number, and 
a third number is less than a fourth number, and a fifth number is equal to a sixth 
number. It would be necessary for previous microprocessors to perform a lengthy 
series of comparisons heavily interspersed with conditional branches. A 
particularly undesirable feature of this serial scheme of comparing and branching 
is observed in any microprocessor having an instruction pipeline. 
[001 3] In a pipelined microprocessor, more than one instruction is being executed 

at any given time, with the plural instructions being in different stages of execution 
at any given moment. This provides for vastly improved throughput. A typical 
pipeline microprocessor may include pipeline stages for: (a) fetching an 
instruction, (b) decoding the instruction, (c) obtaining the instruction's operands, 
(d) executing the instruction, and (e) storing the results. The problem arises when 
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a conditional branch instruction is fetched. It may be the case that the conditional 
branch's condition cannot yet be tested, as the operands may not yet be calculated, 
if they are to result from operations which are yet in the pipeline. This results in 
a "pipeline stall," which dramatically slows down the processor. 
[0014] Another shortcoming of previous microprocessor-based systems is that 

they have included only a single set of registers of any given data type. In 
previous architectures, when an increased number of registers has been desired 
within a given data type, the solution has been simply to increase the size of the 
single set of those type of registers. This may result in addressing problems, 
access conflict problems, and symmetry problems. 
[0015] On a similar note, previous architectures have restricted each given register 

set to one respective numerical data type. Various prior systems have allowed 
general purpose registers to hold either numerical data or address "data," but the 
present application will not use the term "data" to include addresses. What is 
intended may be best understood with reference to two prior systems. The Intel 
8085 microprocessor includes a register pair "HL" which can be used to hold 
either two bytes of numerical data or one two-byte address. The present 
application's improvement is not directed to that issue. More on point, the Intel 
80486 microprocessor includes a set of general purpose integer data registers and 
a set of floating point registers, with each set being limited to its respective data 
type, at least for purposes of direct register usage by arithmetic and logic units. 
[0016] This proves wasteful of the microprocessor's resources, such as the 

available silicon area, when the microprocessor is performing operations which do 
not involve both data types. For example, user applications frequently involve 
exclusively integer operations, and perform no floating point operations 
whatsoever. When such a user application is run on a previous microprocessor 
which includes floating point registers (such as the 80486), those floating point 
registers remain idle during the entire execution. 
[0017] Another problem with previous microprocessor register set architecture 

is observed in context switching or state switching between a user application and 



a higher access privilege level entity such as the operating system kernel. When 
control within the microprocessor switches context, mode, or state, the operating 
system kernel or other entity to which control is passed typically does not operate 
on the same data which the user application has been operating on. Thus, the data 
registers typically hold data values which are not useful to the new control entity 
but which must be maintained until the user application is resumed. The kernel 
must generally have registers for its own use, but typically has no way of knowing 
which registers are presently in use by the user application. In order to make 
space for its own data, the kernel must swap out or otherwise store the contents 
of a predetermined subset of the registers. This results in considerable loss of 
processing time to overhead, especially if the kernel makes repeated, short- 
duration assertions of control. 

On a related note, in prior microprocessors, when it is required that a 
"grand scale" context switch be made, it has been necessary for the 
microprocessor to expend even greater amounts of processing resources, including 
a generally large number of processing cycles, to save all data and state 
information before making the switch. When context is switched back, the same 
performance penalty has previously been paid, to restore the system to its former 
state. For example, if a microprocessor is executing two user applications, each 
of which requires the full complement of registers of each data type, and each of 
which may be in various stages of condition code setting operations or numerical 
calculations, each switch from one user application to the other necessarily 
involves swapping or otherwise saving the contents of every data register and state 
flag in the system. This obviously involves a great deal of operational overhead, 
resulting in significant performance degradation, particularly if the main or the 
secondary storage to which the registers must be saved is significantly slower than 
the microprocessor itself. 

Therefore, we have discovered that it is desirable to have an improved 
microprocessor architecture which allows the various component conditions of a 
complex condition to be calculated without any intervening conditional branches. 



We have farther discovered that it is desirable that the plural simple conditions be 
calculable in parallel, to improve throughput of the microprocessor. 

[0020] We have also discovered that it is desirable to have an architecture which 

allows multiple register sets within a given data type. 

[0021] Additionally, we have discovered it to be desirable for a microprocessor's 

floating point registers to be usable as integer registers, in case the available 
integer registers are inadequate to optimally to hold the necessary amount of 
integer data. Notably, we have discovered that it is desirable that such re-typing 
be completely transparent to the user application. 

[0022] We have discovered it to be highly desirable to have a microprocessor 

which provides a dedicated subset of registers which are reserved for use by the 
kernel in lieu of at least a subset of the user registers, and that this new set of 
registers should be addressable in exactly the same manner as the register subset 
which they replace, in order that the kernel may use the same register addressing 
scheme as user applications. We have further observed that it is desirable that the 
switch between the two subsets of registers require no microprocessor overhead 
cycles, in order to maximally utilize the microprocessor's resources. 

[0023] Also, we have discovered it to be desirable to have a microprocessor 

architecture which allows for a "grand scale" context switch to be performed with 
minimal overhead. In this vein, we have discovered that is desirable to have an 
architecture which allows for plural banks of register sets of each type, such that 
two or more user applications may be operating in a multi-tasking environment, 
or other "simultaneous" mode, with each user application having sole access to at 
least a full bank of registers. It is our discovery that the register addressing 
scheme should, desirably, not differ between user applications, nor between 
register banks, to maximize simplicity of the user applications, and that the system 
should provide hardware support for switching between the register banks so that 
the user applications need not be aware of which register bank which they are 
presently using or even of the existence of other register banks or of other user 
applications. 



[0024] These and other advantages of our invention will be appreciated with 

reference to the following description of our invention, the accompanying 
drawings, and the claims. 

SUMMARY OF THE INVENTION 

[0025] The present invention provides a register file system comprising: an 

integer register set including first and second subsets of integer registers, and a 
shadow subset; are-typable set of registers which are individually usable as integer 
registers or as floating point registers; and a set of individually addressable 
Boolean registers. 

[0026] The present invention includes integer and floating point functional units 

which execute integer instructions accessing the integer register set, and which 
operate in a plurality of modes. In any mode, instructions are granted ordinary 
access to the first subset of integer registers. In a first mode, instructions are also 
granted ordinary access to the second subset. However, in a second mode, 
instructions attempting to access the second subset are instead granted access to 
the shadow subset, in a manner which is transparent to the instructions. Thus, 
routines may be written without regard to which mode they will operate in, and 
system routines (which operate in the second mode) can have at least the second 
subset seemingly at their disposal, without having to expend the otherwise- 
required overhead of saving the second subset's contents (which may be in use by 
user processes operating in the first mode). 

[0027] The invention further includes aplurality of integer register sets, which are 

individually addressable as specified by fields in instructions. The register sets 
include read ports and write ports which are accessed by multiplexers, wherein the 
multiplexers are controlled by contents of the register set-specifying fields in the 
instructions. 

[0028] One of the integer register sets is also usable as a floating point register 

set. In one embodiment, this set is sixty-four bits wide to hold double-precision 
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floating point data, but only the low order thirty-two bits are used by integer 
instructions. 

[0029] The invention includes functional units for performing Boolean operations, 

and further includes a Boolean register set for holding results of the Boolean 
operations such that no dedicated, fixed-location status flags are required. The 
integer and floating point functional units execute numerical comparison 
instructions, which specify individual ones of the Boolean registers to hold results 
of the comparisons. A Boolean functional unit executes Boolean combinational 
instructions whose sources and destination are specified registers in the Boolean 
register set. Thus, the present invention may perform conditional branches upon 
a single result of a complex Boolean function without intervening conditional 
branch instructions between the fundamental parts of the complex Boolean 
function, minimizing pipeline disruption in the data processor. 
W [0030] Finally, there are multiple, identical register banks in the system, each bank 

including the above-described register sets. A bank may be allocated to a given 
process or routine, such that the instructions within the routine need not specify 
upon which bank they operate. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0031] FIG. 1 is a block diagram of the instruction execution unit of the 

microprocessor of the present invention, showing the elements of the register file. 
[0032] FIGS. 2, 2A, 3, 3 A and 4 are simplified schematic and block diagrams of 

the floating point, integer and Boolean portions of the instruction execution unit 
of FIG. 1, respectively. 
[0033] FIGS. 5-6 are more detailed views of the floating point and integer 

portions, respectively, showing the means for selecting between register sets. 
[0034] FIG. 7 illustrates the fields of an exemplary microprocessor instruction 

word executable by the instruction execution unit of FIG. 1. 
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DETAILED DESCRIPTION OF THE INVENTION 
I. Register File 

[0035] FIG. 1 illustrates the basic components of the instruction execution unit 

(IEU) 10 of the RISC (reduced instruction set computing) processor of the 
present invention. The IEU 1 0 includes a register file 1 2 and an execution engine 
14. The register file 12 includes one or more register banks 16-0 to 16-n. It will 
be understood that the structure of each register bank 16 is identical to all of the 
other register banks 16. Therefore, the present application will describe only 
register bank 1 6-0. The register bank includes a register set A 1 8, a register set FB 
20, and a register set C 22. 

[0036] In general, the invention may be characterized as a RISC microprocessor 

having a register file optimally configured for use in the execution of RISC 
instructions, as opposed to conventional register files which are sufficient for use 
in the execution of CISC (complex instruction set computing) instructions by 
CISC processors. By having a specially adapted register file, the execution engine 
of the microprocessor's IEU achieves greatly improved performance, both in terms 
of resource utilization and in terms of raw throughput. The general concept is to 
tune a register set to a RISC instruction, while the specific implementation may 
involve any of the register sets in the architecture. 

A. Register Set A 

[0037] Register set A 1 8 includes integer registers 24 (RA[3 1 :0]) ? each of which 

is adapted to hold an integer value datum. In one embodiment, each integer may 
be thirty-two bits wide. The RA[ ] integer registers 24 include a first plurality 26 
of integer registers (RA[23:0]) and a second plurality 28 of integer registers 
(RA[3 1 :24]). The RA[ ] integer registers 24 are each of identical structure, and 
are each addressable in the same manner, albeit with a unique address within the 
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integer register set 24. For example, a first integer register 30 (RA[0]) is 
addressable at a zero offset within the integer register set 24. 

[0038] RA[0] always contains the value zero. It has been observed that user 

applications and other programs use the constant value zero more than any other 
constant value. It is, therefore, desirable to have a zero readily available at all 
times, for clearing, comparing, and other purposes. Another advantage of having 
a constant, hard- wired value in a given register, regardless of the particular value, 
is that the given register may be used as the destination of any instruction whose 
results need not be saved. 

[0039] Also, this means that the fixed register will never be the cause of a data 

dependency delay. A data dependency exists when a "slave" instruction requires, 
for one or more of its operands, the result of a "master" instruction. In a pipelined 
processor, this may cause pipeline stalls. For example, the master instruction, 
although occurring earlier in the code sequence than the slave instruction, may 
take considerably longer to execute. It will be readily appreciated that if a slave 
"increment and store" instruction operates on the result data of a master 
"quadruple-word integer divide" instruction, the slave instruction will be fetched, 
decoded, and awaiting execution many clock cycles before the master instruction 
has finished execution. However, in certain instances, the numerical result of a 
master instruction is not needed, and the master instruction is executed for some 
other purpose only, such as to set condition code flags. If the master instruction's 
destination is RA[0], the numerical results will be effectively discarded. The data 
dependency checker (not shown) of the IEU 1 0 will not cause the slave instruction 
to be delayed, as the ultimate result of the master instruction— zero—is already 
known. 

[0040] The integer register set A 24 also includes a set of shadow registers 32 

(RT[31:24]). Each shadow register can hold an integer value, and is, in one 
embodiment, also thirty-two bits wide. Each shadow register is addressable as an 
offset in the same manner in which each integer register is addressable. 
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[0041] Finally, the register set A includes an IEU mode integer switch 34. The 

switch 34, like other such elements, need not have a physical embodiment as a 
switch, so long as the corresponding logical functionality is provided within the 
register sets. The IEU mode integer switch 34 is coupled to the first subset 26 of 
integer registers on line 36, to the second subset of integer registers 28 on line 38, 
and to the shadow registers 32 on line 40. All accesses to the register set A 1 8 are 

y, made through the IEU mode integer switch 34 on line 42. Any access request to 

read or write a register in the first subset RA[23:0] is passed automatically 

yl through the IEU mode integer switch 34. However, accesses to an integer 

o 

p register with an offset outside the first subset RA[23 :0] will be directed either to 

JJ{ the second subset RA[3 1 :24] or the shadow registers RT[3 1 :24], depending upon 

^ the operational mode of the execution engine 14. 

h& [0042] The IEU mode integer switch 34 is responsive to a mode control unit 44 

is; 

i* in the execution engine 14. The mode control unit 44 provides pertinent state or 

P mode information about the IEU 1 0 to the IEU mode integer switch 34 on line 46. 

When the execution engine performs a context switch such as a transfer to kernel 
mode, the mode control unit 44 controls the IEU mode integer switch 34 such that 
any requests to the second subset RA[31:24] are re-directed to the shadow 
RT[3 1 :24], using the same requested offset within the integer set. Any operating 
system kernel or other then-executing entity may thus have apparent access to the 
second subset RA[3 1 :24] without the otherwise-required overhead of swapping 
the contents of the second subset RA[3 1 :24] out to main memory, or pushing the 
second subset RA[31:24] onto a stack, or other conventional register-saving 
technique. 

[0043] When the execution engine 14 returns to normal user mode and control 

passes to the originally-executing user application, the mode control unit 44 
controls the IEU mode integer switch 34 such that access is again directed to the 
second subset RA[31:24]. In one embodiment, the mode control unit 44 is 
responsive to the present state of interrupt enablement in the IEU 10. In one 
embodiment, the execution engine 14 includes a processor status register (PSR) 
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(not shown), which includes a one-bit flag (PSR[7]) indicating whether interrupts 
are enabled or disabled. Thus, the line 46 may simply couple the IEU mode 
integer switch 34 to the interrupts-enabled flag in the PSR. While interrupts are 
disabled, the IEU 10 maintains access to the integers RA[23:0], in order that it 
may readily perform analysis of various data of the user application. This may 
allow improved debugging, error reporting, or system performance analysis. 

B. Register Set FB 

The re-typable register set FB 20 may be thought of as including floating 
point registers 48 (RF[31:0]); and/or integer registers 50 (RB[31:0]). When 
neither data type is implied to the exclusion of the other, this application will use 
the term RFB[ ]. In one embodiment, the floating point registers RF[ ] occupy the 
same physical silicon space as the integer registers RB [ ] . In one embodiment, the 
floating point registers RF[ ] are sixty-four bits wide and the integer registers 
RB[ ] are thirty -two bits wide. It will be understood that if double-precision 
floating point numbers are not required, the register set RFB[ ] may 
advantageously be constructed in a thirty-two-bit width to save the silicon area 
otherwise required by the extra thirty-two bits of each floating point register. 

Each individual register in the register set RFB [ ] may hold either a floating 
point value or an integer value. The register set RFB[ ] may include optional 
hardware for preventing accidental access of a floating point value as though it 
were an integer value, and vice versa. In one embodiment, however, in the 
interest of simplifying the register set RFB[ ], it is simply left to the software 
designer to ensure that no erroneous usages of individual registers are made. 
Thus, the execution engine 14 simply makes an access request on line 52, 
specifying an offset into the register set RFB[ ], without specifying whether the 
register at the given offset is intended to be used as a floating point register or an 
integer register. Within the execution engine 14, various entities may use either 
the full sixty-four bits provided by the register set RFB [ ], or may use only the low 
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order thirty-two bits, such as in integer operations or single-precision floating 
point operations. 

[0046] A first register RFB [0] 5 1 contains the constant value zero, in a form such 

that RB[0] is a thirty-two-bit integer zero (0000 hex ) and RF[0] is a sixty-four-bit 
floating point zero (00000000 hex ). This provides the same advantages as described 
above for RA[0]. 

p c. Register Set C 

2 

p [0047] The register set C 22 includes a plurality of Boolean registers 54 

W (RC[3 1 :0]). RC[ ] is also known as the "condition status register" (CSR). The 

01 

s Boolean registers RC[ ] are each identical in structure and addressing, albeit that 

o 

y> each is individually addressable at a unique address or offset within RC[ ]. 

m [0048] In one embodiment, register set C further includes a "previous condition 

O status register" (PCSR) 60, and the register set C also includes a CSR selector unit 

ft! 

62, which is responsive to the mode control unit 44 to select alternatively between 
the CSR 54 and the PCSR 60. In the one embodiment, the CSR is used when 
interrupts are enabled, and the PCSR is used when interrupts are disabled. The 
CSR and PCSR are identical in all other respects. In the one embodiment, when 
interrupts are set to be disabled, the CSR selector unit 62 pushes the contents of 
the CSR into the PCSR, overwriting the former contents of the PCSR, and when 
interrupts are re-enabled, the CSR selector unit 62 pops the contents of the PCSR 
back into the CSR. In other embodiments it may be desirable to merely alternate 
access between the CSR and the PCSR, as is done with RA[31:24] and 
RT[3 1 :24] . In any event, the PCSR is always available as a thirty-two-bit "special 
register." 

[0049] None of the Boolean registers is a dedicated condition flag, unlike the 

Boolean registers in previously known microprocessors. That is, the CSR 54 does 
not include a dedicated carry flag, nor a dedicated a minus flag, nor a dedicated 
flag indicating equality of a comparison or a zero subtraction result. Rather, any 
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Boolean register may be the destination of the Boolean result of any Boolean 
operation. As with the other register sets, a first Boolean register 58 (RC[0]) 
always contains the value zero.* to obtain the advantages explained above for 
RA[0]. In the preferred embodiment, each Boolean register is one bit wide, 
indicating one Boolean value. 

II . Execution Engine 

The execution engine 14 includes one or more integer functional units 66, 
one or more floating point functional units 68, and one or more Boolean 
functional units 70. The functional units execute instructions as will be explained 
below. Buses 72, 73, and 75 connect the various elements of the IEU 1 0, and will 
each be understood to represent data, address, and control paths. 

A. Instruction Format 

FIG. 7 illustrates one exemplary format for an integer instruction which the 
execution engine 14 may execute. It will be understood that not all instructions 
need to adhere strictly to the illustrated format, and that the data processing 
system includes an instruction fetcher and decoder (not shown) which are adapted 
to operate upon varying format instructions. The single example of FIG. 7 is for 
ease in explanation only. Throughout this application the identification I[ ] will 
be used to identify various bits of the instruction. I [3 1 :30] are reserved for future 
implementations of the execution engine 1 4. 1 [29 :26] identify the instruction class 
of the particular instruction. Table 1 shows the various classes of instructions 
performed by the present invention. 





TABLE 1 




Instruction Classes 


Class 


Instructions 
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0-3 


Integer and floating point register-to-register instructions 


4 


Immediate constant load 


5 


Reserved 


6 


Load 


7 


Store 


8-11 


Control Flow 


12 


Modifier 


13 


Boolean operations 


14 


Reserved 


15 


Atomic (extended) 



H [0052] Instruction classes of particular interest to this Application include the 

yj Class 0-3 register-to-register instructions and the Class 13 Boolean operations. 

H While other classes of instructions also operate upon the register file 12, further 

Ins ? 

3 y discussion of those classes is not believed necessary in order to fully understand 

the present invention. 

[0053] I[25] is identified as B0, and indicates whether the destination register is 

in register set A or register set B. I [24:22] are an opcode which identifies, within 
the given instruction class, which specific function is to be performed. For 
example, within the register-to-register classes an opcode may specify "addition." 
I [21] identifies the addressing mode which is to be used when performing the 
instruction-either register source addressing or immediate source addressing. 
I[20:16] identify the destination register as an offset within the register set 
indicated by B0. I[l 5] is identified as B 1 and indicates whether the first operand 
is to be taken from register set A or register set B. I[14:10] identify the register 
offset from which the first operand is to be taken. I [9: 8] identify a function 
selection— an extension of the opcode I[24:22]. I[7:6] are reserved. I[5] is 
identified as B2 and indicates whether a second operand of the instruction is to be 
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taken from register set A or register set B. Finally, I[4:0] identify the register 
offset from which the second operand is to be taken. 
[0054] With reference to FIG. 1 , the integer functional unit 66 and floating point 

functional unit 68 are equipped to perform integer comparison instructions and 
floating point comparisons, respectively. The instruction format for the 
comparison instruction is substantially identical to that shown in FIG. 7, with the 
- , caveat that various fields may advantageously be identified by slightly different 

O names. I[20: 1 6] identifies the destination register where the result is to be stored, 

Ql but the addressing mode field I[21] does not select between register sets A or B. 

5} Rather, the addressing mode field indicates whether the second source of the 

comparison is found in a register or is immediate data. Because the comparison 
* is a Boolean type instruction, the destination register is always found in register 

Jt set C. All other fields function as shown in FIG. 7. In performing Boolean 

^ operations within the integer and floating point functional units, the opcode and 

Q function select fields identify which Boolean condition is to be tested for in 

comparing the two operands. The integer and the floating point functional units 
fully support the IEEE standards for numerical comparisons. 
[0055] The IEU 1 0 is a load/store machine. This means that when the contents 

of a register are stored to memory or read from memory, an address calculation 
must be performed in order to determine which location in memory is to be the 
source or the destination of the store or load, respectively. When this is the case, 
the destination register field I [20 : 1 6] identifies the register which is the destination 
or the source of the load or store, respectively. The source register 1 field, 
I[14: 10], identifies a register in either set A or B which contains a base address of 
the memory location. In one embodiment, the source register 2 field, I [4:0], 
identifies a register in set A or set B which contains an index or an offset from the 
base. The load/store address is calculated by adding the index to the base. In 
another mode, I[7:0] include immediate data which are to be added as an index to 
the base. 
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B. Operation of the Instruction Execution Unit and Register Sets 

[0056] It will be understood by those skilled in the art that the integer functional 

unit 66, the floating point functional unit 68, and the Boolean functional unit 70 
are responsive to the contents of the instruction class field, the opcode field, and 
the function select field of a present instruction being executed. 

1 . Integer Operations 

[0057] For example, when the instruction class, the opcode, and function select 

indicate that an integer register-to-register addition is to be performed, the integer 
functional unit may be responsive thereto to perform the indicated operation, while 
the floating point functional unit and the Boolean functional unit may be 
responsive thereto to not perform the operation. As will be understood from the 
cross-referenced applications, however, the floating point functional unit 68 is 
equipped to perform both floating point and integer operations. Also, the 
functional units are constructed to each perform more than one instruction 
simultaneously. 

[0058] The integer functional unit 66 performs integer functions only. Integer 

operations typically involve a first source, a second source, and a destination. A 
given integer instruction will specify a particular operation to be performed on one 
or more source operands and will specify that the result of the integer operation 
is to be stored at a given destination. In some instructions, such as address 
calculations employed in load/store operations, the sources are utilized as a base 
and an index. The integer functional unit 66 is coupled to a first bus 72 over 
which the integer functional unit 66 is connected to a switching and multiplexing 
control (SMC) unit A 74 and an SMC unit B 76. Each integer instruction 
executed by the integer functional unit 66 will specify whether each of its : sources 
and destination reside in register set A or register set B. 
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[0059] Suppose that the IEU 1 0 has received, from the instruction fetch unit (not 

shown), an instruction to perform an integer register-to-register addition. In 
various embodiments, the instruction may specify a register bank, perhaps even a 
separate bank for each source and destination. In one embodiment, the instruction 
I[ ] is limited to a thirty-two-bit length, and does not contain any indication of 
which register bank 16-0 through 16-n is involved in the instruction. Rather, the 
bank selector unit 78 controls which register bank is presently active. In one 
embodiment, the bank selector unit 78 is responsive to one or more bank selection 
bits in a status word (not shown) within the IEU 10. 
5 [0060] In order to perform the integer addition instruction, the integer functional 

5 unit 66 is responsive to the identification in I[14:10] and I[4:0] of the first and 

second source registers. The integer functional unit 66 places an identification of 
the first and second source registers at ports SI and S2, respectively, onto the 
integer functional unit bus 72 which is coupled to both SMC units A and B 74 and 
76. In one embodiment, the SMC units A and B are each coupled to receive B 0-2 
from the instruction I [ ] . In one embodiment, a zero in any respective Bn indicates 
register set A, and a one indicates register set B. During load/store operations, 
the source ports of the integer and floating point functional units 66 and 68 are 
utilized as a base port and an index port, B and I, respectively. 
[0061] After obtaining the first and second operands from the indicated register 

sets on the bus 72, as explained below, the integer functional unit 66 performs the 
indicated operation upon those operands, and provides the result at port D onto 
the integer functional unit bus 72. The SMC units A and B are responsive to B0 
to route the result to the appropriate register set A or B. 
[0062] The SMC unit B is further responsive to the instruction class, opcode, and 

function selection to control whether operands are read from (or results are stored 
to) either a floating point register RF[ ] or an integer register RB[ ]. As indicated, 
in one embodiment, the registers RF[ ] may be sixty-four bits wide while the 
registers are RB[ ] are only thirty-two bits wide. Thus, SMC unit B controls 
whether a word or a double word is written to the register set RFB[ ]. Because 
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all registers within register set A are thirty-two bits wide, SMC unit A need not 
include means for controlling the width of data transfer on the bus 42. 
[0063] All data on the bus 42 are thirty-two bits wide, but other sorts of 

complexities exist within register set A. The IEU mode integer switch 34 is 
responsive to the mode control unit 44 of the execution engine 14 to control 
whether data on the bus 42 are connected through to bus 36, bus 38 or bus 40, 
and vice versa. 

[0064] IEU mode integer switch 34 is further responsive to I[20:16], I[14:10], 

and I[4:0]. If a given indicated destination or source is in RA[23:0], the IEU 
mode integer switch 34 automatically couples the data between lines 42 and 36. 
However, for registers RA[31:24], the IEU mode integer switch 34 determines 
whether data on line 42 is connected to line 38 or line 40, and vice versa. When 
interrupts are enabled, IEU mode integer switch 34 connects the SMC unit A to 
the second subset 28 of integer registers RA[31:24]. When interrupts are 
disabled, the IEU mode integer switch 34 connects the SMC unit A to the shadow 
registers RT[3 1 :24]. Thus, an instruction executing within the integer functional 
unit 66 need not be concerned with whether to address RA[3 1 :24] or RT[3 1 :24] . 
It will be understood that SMC unit A may advantageously operate identically 
whether it is being accessed by the integer functional unit 66 or by the floating 
point functional unit 68, 

2. Floating Point Operations 

[0065] The floating point functional unit 68 is responsive to the class, opcode, and 

function select fields of the instruction, to perform floating point operations. The 
S 1 , S2, and D ports operate as described for the integer functional unit 66. SMC 
unit B is responsive to retrieve floating point operands from, and to write 
numerical floating point results to, the floating point registers RF[ ] on bus 52. 

3. Boolean Operations 
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[0066] SMC unit C 80 is responsive to the instruction class, opcode, and function 

select fields of the instruction I[ ]. When SMC unit C detects that a comparison 
operation has been performed by one of the numerical functional units 66 or 68, 
it writes the Boolean result over bus 56 to the Boolean register indicated at the D 
port of the functional unit which performed the comparison. 

[0067] The Boolean functional unit 70 does not perform comparison instructions 

as do the integer and floating point functional units 66 and 68. Rather, the 
Boolean functional unit 70 is only used in performing bitwise logical combination 
of Boolean registers contents, according to the Boolean functions listed in 
Table 2. 



TABLE 2 
Boolean Functions 


I[23,22,9,8] 


Boolean result calculation 


0000 


ZERO 


0001 


SI AND S2 


0010 


SI AND (NOT S2) 


0011 


SI 


0100 


(NOT SI) AND S2 


0101 


S2 


0110 


SI XOR S2 


0111 


SI OR S2 


1000 


SI NOR S2 


1001 


SI XNOR S2 


1010 


NOT S2 


1011 


SI OR(NOTS2) 


1100 


NOT SI 


1101 


(NOT SI) OR S2 
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1110 


SI NAND S2 


1111 


ONE 



iT=5 



[0068] The advantage which the present invention obtains by having a plurality 

of homogenous Boolean registers, each of which is individually addressable as the 
destination of a Boolean operation, will be explained with reference to Tables 3-5 . 
Table 3 illustrates an example of a segment of code which performs a conditional 
branch based upon a complex Boolean function. The complex Boolean function 
includes three portions which are OR-ed together. The first portion includes two 
sub-portions, which are AND-ed together. 



s 




TABLE 3 


o 

2 - 




Example of Complex Boolean Function 


3.1 

i " 


1 


RA[1] : = 0; 


3 : 
>8ptf 

m 


2 


IF (((RA[2] = RA[3]) AND (RA[4] > RA[5])) OR 




3 


(RA[6] < RA[7]) OR 




4 


(RA[8] <> RA[9])) THEN 




5 


X() 




6 


ELSE 




7 


Y(); 




8 


RA[10] : = 1; 



[0069] Table 4 illustrates, in pseudo-assembly form, one likely method by which 

previous microprocessors would perform the function of Table 3. The code in 
Table 4 is written as though it were constructed by a compiler of at least normal 
intelligence operating upon the code of Table 3. That is, the compiler will 
recognize that the condition expressed in lines 2-4 of Table 3 is passed if any of 
the three portions is true. 
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TABLE 4 

Execution of Complex Boolean Function 
Without Boolean Register Set 


1 


START 


LDI 


RA[1],0 


2 


TEST1 


CMP 


RA[2],RA[3] 


3 




BNE 


TEST2 


4 






R AT41 RAT51 


5 




BGT 


DO-IF 


6 


ibS 12 






7 




BLT 


DO-IF 


8 


TEST3 


CMP 


RA[8],RA[9] 


9 




BEQ 


DO-ELSE 


10. 


DO-IF 


JSR 


ADDRESS OF X( ) 


11 




JMP 


PAST-ELSE 


12 


DO-ELSE 


JSR 


ADDRESS OF Y( ) 


13 


PAST-ELSE 


LDI 


RA[10],1 



The assignment at line 1 of Table 3 is performed by the "load immediate" 
statement at line 1 ofTable4. The first portion ofthe complex Boolean condition, 
expressed at line 2 of Table 3, is represented by the statements in lines 2-5 of 
Table 4. To test whether RA[2] equals RA[3], the compare statement at line 2 of 
Table 4 performs a subtraction of RA[2] from RA[3] or vice versa, depending 
upon the implementation, and may or may not store the result of that subtraction. 
The important function performed by the comparison statement is that the zero, 
minus, and carry flags will be appropriately set or cleared. 
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[0071] The conditional branch statement at line 3 of Table 4 branches to a 

subsequent portion of code upon the condition that RA[2] did not equal RA[3]. 
If the two were unequal, the zero flag will be clear, and there is no need to 
perform the second sub-portion. The existence of the conditional branch 
statement at line 3 of Table 4 prevents the further fetching, decoding, and 
executing of any subsequent statement in Table 4 until the results of the 
comparison in line 2 are known, causing a pipeline stall. If the first sub-portion 
q of the first portion (TEST 1 ) is passed, the second sub-portion at line 4 of Table 

2 4 then compares RA[4] to RA[5], again setting and clearing the appropriate status 

O flags. 

5 [0072] If RA[2] equals RA[3], and RA[4] is greater than RA[5], there is no need 

to test the remaining two portions (TEST2 and TEST3) in the complex Boolean 
function, and the statement at Table 4, line 5, will conditionally branch to the label 
DO-IF, to perform the operation inside the "IF" of Table 3. However, if the first 
portion of the test is failed, additional processing is required to determine which 
of the "IF" and "ELSE" portions should be executed. 
[0073] The second portion of the Boolean function is the comparison of RA[6] 

to RA[7], at line 6 of Table 4, which again sets and clears the appropriate status 
flags. If the condition "less than" is indicated by the status flags, the complex 
Boolean function is passed, and execution may immediately branch to the DO-IF 
label. In various prior microprocessors, the "less than" condition may be tested 
by examining the minus flag. If RA[7] was not less than RA[6], the third portion 
of the test must be performed. The statement at line 8 of Table 4 compares RA[8] 
to RA[9]. If this comparison is failed, the "ELSE" code should be executed; 
otherwise, execution may simply fall through to the "IF" code at line 10 of 
Table 4, which is followed by an additional jump around the "ELSE" code. Each 
of the conditional branches in Table 4, at lines 3, 5, 7 and 9, results in a separate 
pipeline stall, significantly increasing the processing time required for handling this 
complex Boolean function. 
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The greatly improved throughput which results from employing the 
Boolean register set C of the present invention will now readily be seen with 
specific reference to Table 5. 



TABLE 5 

Execution of Complex Boolean Function 
With Boolean Register Set 


1 


START 


LDI 


RA[1],0 


2 


TEST1 


CMP 


RC[1 1],RA[2],RA[3],EQ 


3 




CMP 


RC[12],RA[4],RA[5],GT 


4 


TEST2 


CMP 


RC[13],RA[6],RA[7],LT 


5 


TEST3 


CMP 


RC[14],RA[8],RA[9],NE 


6 


COMPLEX 


AND 


RC[15],RC[11],RC[12] 


7 




OR 


RC[16],RC[13],RC[14] 


8 




OR 


RC[17],RC[15],RC[16] 


9 




BC 


RC[17],DO-ELSE 


10 


DO-IF 


JSR 


ADDRESS OF X( ) 


11 




JMP 


PAST-ELSE 


12 


DO-ELSE 


JSR 


ADDRESS OF Y( ) 


13 


PAST-ELSE 


LDI 


RA[10],1 



[0075] Most notably seen at lines 2-5 of Table 5 , the Boolean register set C allows 

the microprocessor to perform the three test portions back-to-back without 
intervening branching. Each Boolean comparison specifies two operands, a 
destination, and a Boolean condition for which to test. For example, the 
comparison at line 2 of Table 5 compares the contents of RA[2] to the contents 
of RA[3], tests them for equality, and stores into RC[1 1] the Boolean value of the 
result of the comparison. Note that each comparison of the Boolean function 
stores its respective intermediate results in a separate Boolean register. As will 
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be understood with reference to the above-referenced related applications, the 
IEU 10 is capable of simultaneously performing more than one of the 
comparisons. 

[0076] After at least the first two comparisons at lines 2-3 of Table 5 have been 

completed, the two respective comparison results are AND-ed together as shown 
at line 6 of Table 3. RC[15] then holds the result of the first portion of the test. 
The results of the second and third sub-portions of the Boolean function are OR- 
ed together as seen in Table 5, line 7. It will be understood that, because there are 
no data dependencies involved, the AND at line 6 and the OR-ed in line 7 may be 
performed in parallel. Finally, the results of those two operations are OR-ed 
together as seen at line 8 of Table 5. It will be understood that register RC[17] 
will then contain a Boolean value indicating the truth or falsity of the entire 
complex Boolean function of Table 3. It is then possible to perform a single 
conditional branch, shown at line 9 of Table 5. In the mode shown in Table 5, the 
method branches to the "ELSE" code if Boolean register RC[17] is clear, 
indicating that the complex function was failed. The remainder of the code may 
be the same as it was without the Boolean register set as seen in Table 4. 

[0077] The Boolean functional unit 70 is responsive to the instruction class, 

opcode, and function select fields as are the other functional units. Thus, it will 
be understood with reference to Table 5 again, that the integer and/or floating 
point functional units will perform the instructions in lines 1-5 and 13, and the 
Boolean functional unit 70 will perform the Boolean bitwise combination 
instructions in lines 6-8. The control flow and branching instructions in line 9- 1 2 
will be performed by elements of the IEU 10 which are not shown in FIG. 1 . 

III. Data Paths 



[0078] FIGS. 2-5 illustrate further details of the data paths within the floating 

point, integer, and Boolean portions of the IEU, respectively. 
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A. Floating Point Portion Data Paths 



[0079] As seen in FIG. 2, the register set FB 20 is a multi-ported register set. In 

one embodiment, the register set FB 20 has two write ports WFB0-1, and five 
read ports RDFBO-4. The-floating point functional unit 68 of FIG. 1 is comprised 
of the ALU2 102, FALU 104, MULT 106, and NULL 108 of FIG. 2. All 
elements of FIG. 2 except the register set 20 and the elements 102-108 comprise 
the SMC unit B of FIG. 1. 
[0080] External, bidirectional data bus EX-DATA[ ] provides data to the floating 

5f point load/store unit 122. Immediate floating point data bus LDF_IMED[ ] 

U 

Gf provides data from a "load immediate" instruction. Other immediate floating point 

01 

J data are provided on busses RFF1JMED and RFF2JMED, such as is involved 

!*? in an "add immediate" instruction. Data are also provided on bus EX SR DT[ ], 

W in response to a "special register move" instruction. Data may also arrive from the 

p integer portion, shown in FIG. 3, on busses 1 14 and 120. 

[0081] The floating point register set's two write ports WFB0 and WFB1 are 

coupled to write multiplexers 110-0 and 110-1, respectively. The write 
multiplexers 110 receive data from: the ALU0 or SHF0 of the integer portion of 
FIG. 3; the FALU; the MULT; the ALU2; either EX__SR_DT[ ] or LDF-IMED[ ] ; 
and EX_DATA[ ]. Those skilled in the art will understand that control signals 
(not shown) determine which input is selected at each port, and address signals 
(not shown) determine to Which register the input data are written. Multiplexer 
control and register addressing are within the skill of persons in the art, and will 
not be discussed for any multiplexer or register set in the present invention. 
[0082] The floating point register set's five read ports RDFB0 to RDFB4 are 

coupled to read multiplexers 11 2-0 to 112-4, respectively. The read multiplexers 
each also receives data from: either EX_SRJDT[ ] or LDF_IMED[ ], on load 
immediate bypass bus 126; a load external data bypass bus 127, which allows 
external load data to skip the register set FB; the output of the ALU2 1 02, which 
performs non-multiplication integer operations; the FALU 104, which performs 
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non-multiplication floating point operations; the MULT 106, which performs 
multiplication operations; and either the ALUO 1 40 or the SHFO 1 44 of the integer 
portion shown in FIG. 3, which respectively perform non-multiplication integer 
operations and shift operations. Read multiplexers 112-1 and 1 12-3 also receive 
data from RFF1 JMED[ ] and RFF2 JMED[ ], respectively. 

[0083] Each arithmetic-type unit 102-106 in the floating point portion receives 

two inputs, from respective sets of first and second source multiplexers SI and S2. 
The first source of each unit ALU2, FALU, and MULT comes from the output of 
either read multiplexer 1 12-0 or 1 12-2, and the second source comes from the 
output of either read multiplexer 1 12-1 or 1 12-3. The sources of the FALU and 
the MULT may also come from the integer portion of FIG. 3 on bus 114. 

[0084] The results of the ALU2, FALU, and MULT are provided back to the 

write multiplexers 1 1 0 for storage into the floating point registers RF[ ], and also 
to the read multiplexers 1 1 2 for re-use as operands of subsequent operations. The 
FALU also outputs a signal F ALUBD indicating the Boolean result of a floating 
point comparison operation. FALU_BD is calculated directly from internal zero 
and sign flags of the FALU. 

[0085] Null byte tester NULL 108 performs null byte testing operations upon an 

operand from a first source multiplexer, in one mode that of the ALU2. NULL 
1 08 outputs a Boolean signal NULLB JBD indicating whether the thirty -two-bit 
first source operand includes a byte of value zero. 

[0086] The outputs of read multiplexers 1 12-0, 112-1, and 1 1 2-4 are provided to 

the integer portion (of FIG. 3) on bus 118. The output of read multiplexer 112-4 
is also provided as STDT_FP[ ] store data to the floating point load/store unit 
122. 

[0087] FIG. 5 illustrates further details of the control of the SI and S2 

multiplexers. As seen, in one embodiment, each S 1 multiplexer may be responsive 
to bit B 1 of the instruction I[ ], and each S2 multiplexer may be responsive to bit 
B2 of the instruction I[ ]. The S 1 and S2 multiplexers select the sources for the 
various functional units. The sources may come from either of the register files, 
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as controlled by the Bl and B2 bits of the instruction itself. Additionally, each 
register file includes two read ports from which the sources may come, as 
controlled by hardware not shown in the Figs. 

B. Integer Portion Data Paths 

[0088] As seen in FIG. 3, the register set A 18 is also multi-ported. In one 

embodiment, the register set A 1 8 has two write ports WA0-1, and five read ports 
RDAO-4. The integer functional unit 66 of FIG. 1 is comprised of the ALUO 140, 
ALU1 142, SHFO 144, and NULL 146 of FIG. 3. All elements of FIG. 3 except 
the register set 1 8 and the elements 140-146 comprise the SMC unit A of FIG. 1 . 

[0089] External data bus EX_DATA[ ] provides data to the integer load/store unit 

1 52. Immediate integer data on bus LDI_IMED[ ] are provided in response to a 
"load immediate" instruction. Other immediate integer data are provided on 
busses RFA1_IMED and RFA2_IMED in response to non-load immediate 
instructions, such as an "add immediate." Data are also provided on bus 
EX_SR_DT[ ] in response to a "special register move" instruction. Data may also 
arrive from the floating point portion (shown in FIG. 2) on busses 116 and 118. 

[0090] The integer register set's two write ports WA0 and WA1 are coupled to 

write multiplexers 148-0 and 148-1, respectively. The write multiplexers 148 
receive data from: the FALU or MULT of the floating point portion (of FIG. 2); 
the ALUO; the ALU1; the SHFO; either EX_SR_DT[ ] or LDI_IMED[ ]; and 
EX_DATA[ ]. 

[0091] The integer register set's five read ports RDA0 to RDA4 are coupled to 

read multiplexers 150-0 to 150-4, respectively. Each read multiplexer also 
receives data from: either EX_SR_DT[ ] or LDI_IMED[ ] on load immediate 
bypass bus 160; a load external data bypass bus 154, which allows external load 
data to skip the register set A; ALUO; ALU1 ; SHFO; and either the FALU or the 
MULT of the floating point portion (of FIG. 2). Read multiplexers 150-1 and 
150-3 also receive data from RFA1_IMED[ ] and RFA2_IMED[ ], respectively. 
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[0092] Each arithmetic-type unit 140-144 in the integer portion receives two 

inputs, from respective sets of first and second source multiplexers SI and S2. 
The first source of ALUO comes from either the output of read multiplexer 1 50-2, 
or a thirty-two-bit wide constant zero (0000 hex ) or floating point read multiplexer 
1 12-4. The second source of ALUO comes from either read multiplexer 1 50-3 or 
floating point read multiplexer 112-1. The first source of ALU 1 comes from 
either read multiplexer 150-0 or IFJPC[ ]. IF_PC[ ] is used in calculating a return 
address needed by the instruction fetch unit (not shown), due to the IEU's ability 
to perform instructions in an out-of-order sequence. The second source of ALU1 
comes from either read multiplexer 1 50-1 or CF_OFFSET[ ]. CF_OFFSET[ ] is 
used in calculating a return address for a CALL instruction, also due to the out-of- 
order capability. 

[0093] The first source of the shifter SHF0 1 44 is from either: floating point read 

multiplexer 112-0 or 112-4; or any-integer read multiplexer 150. The second 
source of SHF0 is from either: floating point read multiplexer 1 12-0 or 1 12-4; or 
integer read multiplexer-1 50-0, 1 50-2, or 1 50-4. SHF0 takes a third input from 
a shift amount multiplexer (SA). The third input controls how far to shift, and is 
taken by the SA multiplexer from either: floating point read multiplexer 112-1; 
integer read multiplexer 150-1 or 150-3; or a five-bit wide constant thirty-one 
(11111 2 or 31 10 ). The shifter SHF0 requires a fourth input from the size 
multiplexer (S) . The fourth input controls how much data to shift, and is taken by 
the S multiplexer from either: read multiplexer 150-1; read multiplexer 1 50-3 ; or 
a five-bit wide constant sixteen (10000 2 or 16 10 ). 

[0094] The results of the ALUO, ALU 1 , and SHF0 are provided back to the write 

multiplexers 148 for storage into the integer registers RA[ ], and also to the read 
multiplexers 1 50 for re-use as operands of subsequent operations. The output of 
either ALUO or SHF0 is provided on bus 120 to the floating point portion of 
FIG. 3. The ALUO and ALU1 also output signals ALU0 BD and ALU1JBD, 
respectively, indicating the Boolean results of integer comparison operations. 
ALU0JBD and ALU1BD are calculated directly from the zero and sign flags of 
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the respective functional units. ALUO also outputs signals EX_TADR[ ] and 
EXVMADR. EX_TADR[ ] is the target address generated for an absolute 
branch instruction, and is sent to the IFU (not shown) for fetching the target 
instruction. EX_VM_ADR[ ] is the virtual address used for all loads from 
memory and stores to memory, and is sent to the VMU (not shown) for address 
translation. 

[0095] Null byte tester NULL 1 46 performs null byte testing operations upon an 

operand from a first source multiplexer. In one embodiment, the operand is from 
the ALUO. NULL 146 outputs a Boolean signal NULLA_BD indicating whether 
the thirty-two-bit first source operand includes a byte of value zero. 
[0096] The outputs of read multiplexers 150-0 and 150-1 are provided to the 

floating point portion (of FIG. 2) on bus 1 1 4. The output of read multiplexer 1 50- 
4 is also provided as STDT _INT[ ] store data to the integer load/store unit 152. 
[0097] A control bit PSR[7] is provided to the register set A 1 8. It is this signal 

which, in FIG. 1, is provided from the mode control unit 44 to the IEU mode 
integer switch 34 on line 46. The IEU mode integer switch is internal to the 
register set A 18 as shown in FIG. 3. 
[0098] FIG. 6 illustrates further details of the control of the SI and S2 

multiplexers. 



Q 



C. Boolean Portion Data Paths 



[0099] As seen in FIG. 4, the register set C 22 is also multi-ported. In one 

embodiment, the register set C 22 has two write ports WC0-1 , and five read ports 
RDA0-4. All elements of FIG. 4 except the register set 22 and the Boolean 
combinational unit 70 comprise the SMC unit C of FIG. 1. 

[0100] The Boolean register set's two write ports WC0 and WC1 are coupled to 

write multiplexers 170-0 and 170-1, respectively. The write multiplexers 170 
receive data from: the output of the Boolean combinational unit 70, indicating the 
Boolean result of a Boolean combinational operation; ALUO JBD from the integer 



-31- 



portion of FIG. 3, indicating the Boolean result of an integer comparison; 
FALUBD from the floating point portion of FIG. 2, indicating the Boolean result 
of a floating point comparison; either ALU1 JBD_P from ALU1, indicating the 
results of a compare instruction in ALU1, or NULLA BD from NULL 146, 
indicating a null byte in the integer portion; and either ALU2 BD P from ALU2, 
indicating the results of a compare operation in ALU2, or NULLB_BD from 
NULL 108, indicating a null byte in the floating point portion. In one mode, the 
n ALUO BD, ALU1_BD, ALU2 BD, and FALU BD signals are not taken from 

2 the data paths, but are calculated as a function of the zero flag, minus flag, carry 

y * 

O flag, and other condition flags in the PSR. In one mode, wherein up to eight 

Q 

gj instructions may be executing at one instant in the IEU, the IEU maintains up to 

m eight PSRs. 

O [0101] The Boolean register set C is also coupled to bus EX_SR_DT[ ], for use 

hi with "special register move" instructions. The CSR may be written or read as a 

whole, as though it were a single thirty-two-bit register. This enables rapid saving 
and restoration of machine state information, such as may be necessary upon 
certain drastic system errors or upon certain forms of grand scale context 
switching. 

[0102] The Boolean register set's five read ports RDC0 to RDC3 are coupled to 

read multiplexers 172-0 to 172-4, respectively. The read multiplexers 172 receive 
the same set of inputs as the write multiplexers 170 receive. The Boolean 
combinational unit 70 receives inputs from read multiplexers 170-0 and 170-1. 
Read multiplexers 172-2 and 172-3 respectively provide signals BLBP_CPORT 
and BLBP_DPORT. BLBP_CPORT is used as the basis for conditional branching 
instructions in the IEU. BLBPJDPORT is used in the "add with Boolean" 
instruction, which sets an integer register in the A or B set to zero or one (with 
leading zeroes), depending upon the content of a register in the C set. Read port 
RDC4 is presently unused, and is reserved for future enhancements of the Boolean 
functionality of the IEU. 
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IV. Conclusion 



M 
m 



[0103] While the features and advantages of the present invention have been 

described with respect to particular embodiments thereof, and in varying degrees 
of detail, it will be appreciated that the invention is not limited to the described 
embodiments. The following Claims define the invention to be afforded patent 
coverage. 
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